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SYSTEM AND METHOD FOR COMPRESSING PORTIONS OF A 
MEDIA SIGNAL USING DIFFERENT CODECS 



Cross-Reference to Related Applications 
[0001] This application is a continuation-in-part of U.S. Patent Application No. 
10/256,866, filed September 26, 2002, which claims the benefit of Provisional 
Application No. 60/325,483, filed September 26, 2001 , both of which are incorporated 
herein by reference. 

Technical Field 

[0002] The present invention relates generally to the field of data compression. 
More specifically, the present invention relates to techniques for optimizing the 
compression of video and audio signals. 

Background of the Invention 
[0003] In the communication age, bandwidth is money. Video and audio signals 
(hereinafter "media signals") consume enormous amounts of bandwidth depending on 
the desired transmission quality. As a result, data compression is playing an 
increasingly important role in communication. 

[0004] Conventionally, the parties to a communication decide on a particular codec 
(compressor/decompressor) for compressing and decompressing media signals. A 
wide variety of codecs are available. General classifications of codecs include discrete 
cosine transfer (DCT) or "block" codecs, fractal codecs, and wavelet codecs. 
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[0005] Some codecs are "lossless," meaning that no data is lost during the 
compression process. A compressed media signal, after being received and 
decompressed by a lossless codec, is identical to the original. However, most 
commercially-available codecs are "lossy" and result in some degradation of the original 
media signal. 

[0006] For lossy codecs, compression "quality" (i.e., how similar a compressed 
media signal is to the original after decompression) varies substantially from codec to 
codec, and may depend, for instance, on the amount of available bandwidth, the quality 
of the communication line, characteristics of the media signal, etc. Another 
compression metric, i.e., performance, relates to the amount of bandwidth required to 
transmit the compressed signal as opposed to the original signal. Typically, lossy 
codecs result in better performance than lossless codecs, which is why they are 
preferred in most applications. 

[0007] Codec designers generally attempt to fashion codecs that produce high 
quality compressed output across a wide range of operating parameters. Although 
some codecs, such as MPEG-2, have gained widespread acceptance because of their 
general usefulness, no codec is ideally suited to all purposes. Each codec has 
individual strengths and weaknesses. 

[0008] Conventionally, the same codec is used to compress and decompress a 
media signal during the entire communication session or uniformly across a storage 
medium (e.g., DVD). However, a media signal is not a static quantity. A video signal, 
for example, may change substantially from scene to scene. Likewise, the available 
bandwidth or line quality may change during the course of a communication. Selecting 
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the wrong codec at the outset can be a costly mistake in terms of the bandwidth 
required to transmit or store the media signal. 

Brief Description of the Drawings 
[0009] FIG. 1 is a block diagram of a conventional communication system using data 
compression; 

[0010] FIG. 2 is a block diagram of a communication system using multiple codecs 
for compressing portions of a media signal according to an embodiment of the 
invention; 

[0011] FIG. 3 is a detailed block diagram of a source system according to a first 
embodiment of the invention; 

[0012] FIG. 4 is a detailed block diagram of a source system according to a second 
embodiment of the invention; 

[0013] FIG. 5 is a detailed block diagram of a selection module; 

[0014] FIG. 6 is a data flow diagram of a process for automatically selecting a codec; 

[0015] FIG. 7 is a detailed block diagram of an artificial intelligence system; 

[0016] FIG. 8 is a data flow diagram of a process for automatically selecting settings 

for a codec; 

[0017] FIG. 9 is a block diagram of a comparison module showing the introduction of 
a licensing cost factor; and 

[0018] FIG. 10 is a block diagram of a process for modifying a target data rate. 
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Detailed Description 

[0019] Reference is now made to the figures in which like reference numerals refer 
to like elements. For clarity, the first digit of a reference numeral indicates the figure 
number in which the corresponding element is first used. 

[0020] In the following description, numerous specific details of programming, 
software modules, user selections, network transactions, database queries, database 
structures, etc., are provided for a thorough understanding of the embodiments of the 
invention. However, those skilled in the art will recognize that the invention can be 
practiced without one or more of the specific details, or with other methods, 
components, materials, etc. 

[0021] In some cases, well-known structures, materials, or operations are not shown 
or described in detail in order to avoid obscuring aspects of the invention. Furthermore, 
the described features, structures, or characteristics may be combined in any suitable 
manner in one or more embodiments. 

[0022] FIG. 1 is a block diagram of a conventional system 100 for communicating 
media signals from a source system 102 to a destination system 104. The source and 
destination systems 102, 104 may be variously embodied, for example, as personal 
computers (PCs), cable or satellite set-top boxes (STBs), or video-enabled portable 
devices, such as personal digital assistants (PDAs) or cellular telephones. 
[0023] Within the source system 102, a video camera 106 or other device captures 
an original media signal 108. A codec (compressor/decompressor) 110 processes the 
original media signal 108 to create a compressed media signal 112, which may be 
delivered to the destination system 104 via a network 114, such as a local area network 
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(LAN) or the Internet. Alternatively, the compressed media signal 112 could be written 
to a storage medium, such as a CD, DVD, flash memory device, or the like. 
[0024] At the destination system 104, the same codec 110 processes the 
compressed media signal 112 received through the network 114 to generate a 
decompressed media signal 116. The destination system 104 then presents the 
decompressed media signal 116 on a display device 118, such as a television or 
computer monitor. 

[0025] Conventionally, the source system 102 uses a single codec 110 to process 
the entire media signal 108 during a communication session or for a particular storage 
medium. However, as noted above, a media signal is not a static quantity. Video 
signals may change substantially from scene to scene. A single codec, which may 
function well under certain conditions, may not fare so well under different conditions. 
Changes in available bandwidth, line conditions, or characteristics of the media signal, 
itself, may drastically change the compression quality to the point that a different codec 
may do much better. In certain cases, a content developer may be able to manually 
specify a change of codec 110 within a media signal 108 where, for instance, the 
content developer knows that one codec 110 may be superior to another codec 110. 
However, this requires significant human effort and cannot be performed in real time. 
[0026] FIG. 2 is a block diagram of an alternative system 200 for communicating 
media signals from a source system 202 to a destination system 204 according to an 
embodiment of the present invention. As before, the source system 202 receives an 
original media signal 108 captured by a video camera 106 or other suitable device. 
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[0027] However, unlike the system 100 of FIG. 1, the depicted system 200 is not 
limited to using a single codec 110 during a communication session or for a particular 
storage medium. Rather, as described in greater detail below, each scene 206 or 
segment of the original media signal 1 08 may be compressed using one of a plurality of 
codecs 110. A scene 206 may include one or more frames of the original media signal 
108. In the case of video signals, a frame refers to a single image in a sequence of 
images. More generally, however, a frame refers to a packet of information used for 
communication. 

[0028] As used herein, a scene 206 may correspond to a fixed segment of the media 
signal 108, e.g., two seconds of audio/video or a fixed number of frames. In other 
embodiments, however, a scene 206 may be defined by characteristics of the original 
media signal 108, i.e., a scene 206 may include two or more frames sharing similar 
characteristics. When one or more characteristics of the original media signal 108 
changes beyond a preset threshold, the source system 202 may detect the beginning of 
a new scene 206. Thus, while the video camera 106 focuses on a static object, a scene 
206 may last until the camera 106, the object, or both are moved. 
[0029] As illustrated, two adjacent scenes 206 within the same media signal 108 
may be compressed using different codecs 110. The codecs 110 may be of the same 
general type, e.g., discrete cosine transform (DCT), or of different types. For example, 
one codec 1 10a may be a DCT codec, while another codec 1 10b is a fractal codec, and 
yet another codec 1 1 0c is a wavelet codec. 

[0030] Unlike conventional systems 100, the system 200 of FIG. 2 automatically 
selects, from the available codecs 110, a particular codec 110 best suited to 
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compressing each scene 206. Details of the selection process are described in greater 
detail below. Briefly, however, the system 200 "remembers" which codecs 1 10 are used 
for scenes 206 having particular characteristics. If a subsequent scene 206 is 
determined to have the same characteristics, the same codec 1 10 is used. However, if 
a scene 206 is found to have substantially different characteristics from those previously 
observed, the system 200 tests various codecs 110 on the scene 206 and selects the 
codec 110 producing the highest compression quality (i.e., how similar the compressed 
media signal 210 is to the original signal 108 after decompression) for a particular target 
data rate. 

[0031] In addition, the source system 202 reports to the destination system 204 
which codec 110 was used to compress each scene 206. As illustrated, this may be 
accomplished by associating codec identifiers 208 with each scene 206 in the resulting 
compressed media signal 210. The codec identifiers 208 may precede each scene 206, 
as shown, or could be sent as a block at some point during the transmission. The 
precise format of the codec identifiers 208 is not crucial to the invention and may be 
implemented using standard data structures known to those of skill in the art. 
[0032] The destination system 204 uses the codec identifiers 208 to select the 
appropriate codecs 110 for decompressing the respective scenes 206. The resulting 
decompressed media signal 116 may then be presented on the display device 118, as 
previously described. 

[0033] FIG. 3 illustrates additional details of the source system 202. In one 
embodiment, an input module 302 receives the original media signal 108 from the video 
camera 106 or other source device. An identification module 304 divides the original 
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media signal 108 into scenes 206 and identifies various characteristics (not shown) of 
each scene 206, as described in greater detail below. 

[0034] Thereafter, for each scene 206, a selection module 306 uses the 
characteristics (or the scene 206 itself) to select the optimal codec 110 from a codec 
library 308. As used herein, "optimal" means producing the highest compression 
quality for the compressed media signal 210 at a particular target data rate (among 
those codecs 1 10 within the codec library 308). 

[0035] In one embodiment, a user may specify a particular target data rate, i.e., 128 
kilobits per second (kbps). Alternatively, the target data rate may be determined by the 
available bandwidth or in light of other constraints. 

[0036] The codec library 308 may include a wide variety of codecs 110. Examples of 
possible video codecs 110 are provided in the following table. In addition, various 
audio-only codecs may be provided, such as MPEG Audio Layer 3 (MP3), MPEG-4 
Structured Audio (MP4-SA), CCITT u-Law, Ogg Vorbis, and AC3. Of course, other 
presently-available or yet-to-be-developed codecs 110 may be used within the scope of 
the invention. 



Table 1 



FOURCC 


Name 


Owner 


FOURCC 


Name 


Owner 


3IV1 


3ivx 


3IVX 


MPG4 


MPEG-4 


Microsoft 


3IV2 


3ivx 


3IVX 


MPGI 


MPEG 


Sigma Designs 


AASC 


Autodesk Animator 
codec 


Autodesk 


MRCA 


Mrcodec 


FAST 
Multimedia 


ADV1 


WaveCodec 


Loronix 


MRLE 


Microsoft RLE 


Microsoft 
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ADVJ 


Avid M-JPEG 


Avid Technology 


MSVC 


Microsoft 
Video 1 


Microsoft 


AEMI 


/VITay VlucOUlNli 

MPEG 1 -I Capture 


Array 
Microsystems 


MSZH 


AVImszh 


Kenji Oshima 


AFLI 


Autodesk Animator 
codec 


Autodesk 


MTX1 

through 

MTX9 




Matrox 


AFLC 


Autodesk Animator 
codec 


Autodesk 


MV12 






AMPG 


Array VideoONE 
MPEG 


Array 
Microsystems 


MWV1 


Aware Motion 
Wavelets 


Aware Inc. 


ANIM 


RDX 


Intel 


nAVI 






AP41 


AngeLrotion 
Definitive 


AngelPotion 


NTN1 


video 
Compression 1 


Nogatech 


ASV1 


Asus Video 


Asus 


NVDS 


NVidia 
Texture 
Format 


NVidia 


ASV2 


1 Asus Video (2) 


Asus 


NVHS 


NVidia 
Texture 
Format 


NVidia 


ASVX 


Asus Video 2.0 


Asus 


NHVU 


NVidia 
Texture 
Format 


NVidia 


AUR2 


Aura 2 Codec - YUV 
422 


Auravision 


NVS0- 
NVS5 




NVidia 


AURA 


Aura 1 Codec - YUV 
411 


Auravision 


NVT0- 
NVT5 




NVidia 


AVRn 


Avid M-JPEG 


Avid Technology 


PDVC 


DVC codec 


1-0 Data Device, 
Inc. 


BINK 


Bink Video 


RAD Game Tools 


PGW 


Radius Video 
Vision 


Radius 


BT20 


Prosumer Video 


Conexant 


PHMO 


Photomotion 


IBM 


BTCV 


Composite Video 
Codec 


Conexant 


PIM1 




Pegasus Imaging 
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BW10 


Broadway MPEG 
Capture/Compression 


Data Translation 


PIM2 




Pegasus Imaging 


CC12 


YUV12 Codec 


Intel 


PIMJ 


Lossless JPEG 


Pegasus Imaging 


CDVC 


Canopus DV Codec 


Canopus 


PDCL 


Video XL 


Pinnacle Systems 


CFCC 


DPS Perception 


Digital Processing 
Systems 


PVEZ 


PowerEZ 


Horizons 
Technology 


CGDI 


Camcorder Video 


Microsoft 


PVMM 


PacketVideo 
Corporation 
MPEG-4 


"P a r VpIA/ i H pn 
± ttvnt^i v iuvu 

Corporation 


CHAM 


Caviara Champagne 


Winnov 


PVW2 


Pegasus 
Wavelet 
Compression 


Pegasus Imaging 


CMYK 


Uncompressed 
CMYK 


Colorgraph 


qpeq 


QPEG 1.1 


Q-Team 


CJPG 


WebCam JPEG 


Creative Labs 


QPEG 


QPEG 


Q-Team 


CPLA 


YUV 4:2:0 


Weitek 


raw 


Raw RGB 




CRAM 


Microsoft Video 1 


Microsoft 


RGBT 


32 bit support 


Computer 
Concepts 


CVID 


Cinepak 


Providenza & 
Boekelheide 


RLE 


Run Length 
Encoder 


Microsoft 


CWLT 


Color WLT DIB 


Microsoft 


RLE4 


4bpp Run 
Length 
Encoder 


Microsoft 


CYUV 


Creative YUV 


Creative Labs 


RLE 8 


8bpp Run 
Length 
Encoder 


Microsoft 


CYUY 




ATI Technologies 


RMP4 


MPEG-4 AS 
Profile Codec 


Sigma Designs 


D261 


H.261 


DEC 


RT21 


Real Time 
Video 2.1 


Intel 


D263 


H.263 


DEC 


rv20 


RealVideo G2 


Real 
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DIV3 


DivX MPEG-4 


DivX 


rv30 


RealVideo 8 


Real 


DIV4 


DivX MPEG-4 


DivX 


RVX 


RDX 


Intel 


DIV5 


DivX MPEG-4 


DivX 


s422 


VideoCap 

C210 
YUV Codec 


Tekram 
International 


DIVX 


DivX 


OpenDivX 


SAN3 


DivX 3 




divx 


DivX 




SDCC 


Digital Camera 
Codec 


oun 

Communications 


DMB1 


Rainbow Runner 
hardware 
compression 


Matrox 


SEDG 


MPEG-4 


Samsung 


DMB2 


Rainbow Runner 
hardware 
compression 


Matrox 


SFMC 


Surface Fitting 
Method 


CrystalNet j 


DSVD 


DV Codec 




SMSC 


Proprietary 
codec 


Radius 


DUCK 


TrueMotion S 


Duck Corporation 


SMSD 


Proprietary 
codec 


Radius 


dv25 


DVCPRO 


Matrox 


smsv 


Wavelet Video 


WorldConnect 
(corporate site) 


dv50 


DVCPRO50 


Matrox 


SP54 




SunPlus 


dvsd 




Pinnacle Systems 


SPIG 


Spigot 


Radius 


DVE2 


DVE-2 
Videoconferencing 
Codec 


InSoft 


SQZ2 


VXTreme 
Video Codec 
V2 


Microsoft 


DVX1 


DVX1000SP Video 
Decoder 


Lucent 


SV10 


Video Rl 


Sorenson Media 


DVX2 


DVX2000S Video 
Decoder 


Lucent 


STVA 


ST CMOS 
Imager Data 


ST 

Microelectronics 


DVX3 


DVX3000S Video 
Decoder 


Lucent 


STVB 


ST CMOS 
Imager Data 


ST 

Microelectronics 
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DX50 


DivX MPEG-4 
version 5 


DivX 


STVC 


ST CMOS 
Imager Data 
(Bunched) 


ST 

Microelectronics 


DXTn 


DirectX Compressed 
Texture 


Microsoft 


STVX 


ST CMOS 
Imager Data 


ST 

Microelectronics 


DXTC 


DirectX Texture 
Compression 


Microsoft 


STVY 


ST CMOS 
Imager Data 


ST 

Microelectronics 


ELKO 


Elsa Quick Codec 


Elsa 


SVQ1 


Sorenson 
Video 


Sorenson Media 


EKQO 


Elsa Quick Codec 


Elsa 


TLMS 


Motion 
Intraframe 
Codec 


TeraLogic 


ESCP 


Escape 


Eidos Technologies 


TLST 


Motion 
Intraframe 
Codec 


TeraLogic 


ETV1 


eTreppid Video 
Codec 


eTreppid 
Technologies 


TM20 


TrueMotion 
2.0 


Duck 
Corporation 


ETV2 


elreppia Video 
Codec 


eTreppid 
Technologies 


TM2X 


TrueMotion 
2X 


Duck 
Corporation 


ETVC 


eTreppid Video 
Codec 


eTreppid 
Technologies 


TMIC 


Motion 
Intraframe 
Codec 


TeraLogic 


FLJP 


Field Encoded 
Motion JPEG 


D-Vision 


TMOT 


TrueMotion S 


Horizons 
Technology 


FRWA 


Forward Motion 
JPEG with alpha 
channel 


SoftLab-Nsk 


TR20 


1 1 UC1VJ.U 11UI1 

RT2.0 


Corporation 


FRWD 


Forward Motion 
JPEG 


SoftLab-Nsk 


TSCC 


TechSmith 
Screen Capture 
Codec 


Techsmith Corp. 


FVF1 


Fractal Video Frame 


Iterated Systems 


TV10 


Tecomac Low- 
Bit Rate Codec 


Tecomac, Inc. 


GLZW 


Motion LZW 


gabest@freemail.hu 


TVJP 




Pinnacle/Truevisi 
on 


GPEG 


Motion JPEG 


gabest@freemail.hu 


TVMJ 




Pinnacle/Truevisi 
on 


GWLT 


Greyscale WLT DEB 


Microsoft 


TY2C 


Trident 
Decompression 


Trident 
Microsystems 
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H260 
through 
H269 


ITUH.26n 


Intel 


TY2N 




Trident 
Microsystems 


HFYU 


Huffman Lossless 
Codec 




TYON 




Trident 
Microsystems 


HMCR 


Rendition Motion 
Compensation 
Format 


Rendition 


UCOD 


ClearVideo 


eMajix.com 


HMRR 


Rendition Motion 
Compensation 
Format 


Rendition 


ULTT 


Ultimotion 


IBM Corp. 


i263 


ITU H.263 


Intel 


V261 


Lucent 
VX2000S 


Lucent 


IAN 


Indeo 4 Codec 


Intel 


V655 


YUV 4:2:2 


Vitec Multimedia 


ICLB 


CellB 
Videoconferencing 
Codec 


InSoft 


VCR1 


ATI Video 
Codec 1 


ATI 
Technologies 


IGOR 


Power DVD 




VCR2 


ATI Video 
Codec 2 


ATI 
Technologies 


IJPG 


Intergraph JPEG 


Intergraph 


VCR3-9 


ATI Video 
Codecs 


ATI 
Technologies 


ILVC 


Layered Video 


Intel 


VDCT 


VideoMaker 
Pro DEB 


Vitec Multimedia 


ILVR 


ITUH.263+ Codec 




VDOM 


VDOWave 


VDONet 


IPDV 


Giga AVI D V Codec 


1-0 Data Device, 
Inc. 


VDOW 


VDOLive 


VDONet 


IR21 


Indeo 2.1 


Intel 


VDTZ 


VideoTizer 
YUV Codec 


Darim Vision Co. 


IRAW 


Intel Uncompressed 
UYUV 


Intel 


VGPX 


VideoGramPix 


Alaris 


IV30 
through 
1 IV39 


Indeo 3 


Ligos 


VIFP 


VFAPI Codec 




IV32 


Indeo 3.2 


Ligos 


VJJDS 




Vitec Multimedia 
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IV40 
through 
IV49 


Indeo Interactive 


Ligos 


VIVO 


Vivo H.263 


Vivo Software 


IV50 


Indeo Interactive 


Ligos 


VIXL 


Video XL 


Pinnacle Systems 


JBYR 




Kensington 


VLV1 




VideoLogic 


JPEG 


JPEG Still Image 


Microsoft 


VP30 


VP3 


On2 


JPGL 


JPEG Light 




VP31 


VP3 


On2 


L261 


Lead H.26 


Lead Technologies 


vssv 


VSS Video 


Vanguard 
Software 
Solutions 


L263 


LeadH.263 


Lead Technologies 


VX1K 


VX1000S 
Video Codec 


Lucent 


LCMW 


Motion CMW Codec 


Lead Technologies 


VX2K 


VX2000S 
Video Codec 


Lucent 


LEAD 


LEAD Video Codec 


Lead Technologies 


VXSP 


VX1000SP 
Video Codec 


Lucent 


LGRY 


Grayscale Image 


Lead Technologies 


VYTJ9 


ATIYUV 


ATI 
Technologies 


Ljpg 


LEAD MJPEG 
Codec 


Lead Technologies 


VYUY 


ATIYUV 


ATI 
Technologies 


LZ01 


Lempel-Ziv- 
Oberhumer Codec 


Markus Oberhumer 


WBVC 


W9960 


Winbond 
Electronics 


M263 


H.263 


Microsoft 


WHAM 


Microsoft 
Video 1 


Microsoft 


M261 


H.261 


Microsoft 


WINX 


Winnov 
Software 
Compression 


Winnov 


M4S2 


MPEG-4 
(automatic WMP 
download) 


Microsoft 


WJPG 


Winbond 
JPEG 




MC12 


Motion 
Compensation 
Format 


ATI Technologies 


WNV1 


Winnov 
Hardware 
Compression 


Winnov 
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MCAM 


Motion 
Compensation 
Format 


ATI Technologies 


x263 




Xirlink 


MJ2C 


Motion JPEG 2000 


iviorgdn 
Multimedia 


XVID 


A. V ID JYLr CXj- 

4 


XVE) 


mJPG 


Motion JPEG 
including Huffman 
Tables 


IBM 


XLVO 


XL Video 
Decoder 


NetXL Inc. 


MJPG 


Motion JPEG 




XMPG 


XING MPEG 


XING 
Corporation 


MMES 


MPEG-2 ES 


Matrox 


xwvo- 

XWV9 


XiWave Video 
Codec 


XiWave 


MP2A 


Eval download 


Media Excel 


XXAN 




Origin 


MP2T 


Eval download 


Media Excel 


Y411 


YUV 4:1:1 


Microsoft 


MP2V 


Eval download 


Media Excel 


Y41P 


jDrooKirec 
YUV 4:1:1 


Conexant 


MP42 


MPEG-4 
(automatic WMP 
download) 


Microsoft 


Y8 


Gravscale 
video 




MP43 


MPEG-4 
(automatic WMP 
download) 


Microsoft 


YC12 


YUV 12 codec 


Intel 


MP4A 


Eval download 


Media Excel 


YUV8 


Caviar YUV8 


Winnpv 


MP4S 


MPEG-4 
(automatic WMP 
download) 


Microsoft 


YUY2 


Raw, 
uncompressed 
YUV 4:2:2 


Microsoft 


MP4T 


Eval download 


Media Excel 


YUYV 




Canopus 


MP4V 


Eval download 


Media Excel 


ZLIB 






MPEG 


MPEG 




ZPEG 


Video Zipper 


Metheus 


MPG4 


MPEG-4 
(automatic WMP 
download) 


Microsoft 


ZyGo 


ZyGoVideo 


ZyGo Digital 
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[0037] Those of skill in the art will recognize that many of the above-described 
codecs may be deemed "generalist" codecs in that they achieve a high compression 
quality for a wide variety of media signals and conditions. However, other codecs may 
be deemed "specialist" codecs because they compress certain types of media signals 
well or compress many types of media signals well under certain conditions. Providing 
a codec library 308 that includes a variety of both generalist and specialist codecs, 
including codecs of different families, typically results in the best overall compression 
quality for a compressed media signal 210. 

[0038] Referring again to FIG. 3, after a codec 110 is selected for a scene 206, a 
compression module 310 compresses the scene 206 using the selected codec 110. An 
output module 312 receives the resulting compressed media signal 210 and, in one 
embodiment, adds codec identifiers 208 to indicate which codecs 110 were used to 
compress each scene 206. In other embodiments, the codec identifiers 208 may be 
added by the compression module 310 or at other points in the compression process. 
The output module 312 then delivers the compressed media signal 210 to the 
destination system 204 via the network 1 14. 

[0039] The embodiment of FIG. 3 is primarily applicable to streaming media 
applications, including video conferencing. In an alternative embodiment, as depicted in 
FIG. 4, the output module 312 may be coupled to a storage device 402, such as CD or 
DVD recorder, flash card writer, or the like. As depicted, the compressed media signal 
210 (and codec identifiers 208) may be stored on an appropriate storage medium 404, 
which is physically delivered to the destination system 204. In such an embodiment, 
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the destination system 204 would include a media reader (not shown) for reading the 
compressed media signal 210 from the storage medium 404. 

[0040] Unlike conventional media compression techniques, the original media signal 
108 is not compressed using a single codec (i.e., MPEG-2 as in DVDs). Rather, each 
scene 206 is automatically compressed using the best codec 110 selected from a codec 
library 308 for that scene 206. Using the above-described technique, between 10 to 12 
hours of DVD-quality video may be stored on a single recordable DVD. 
[0041] FIG. 5 illustrates additional details of the selection module 306. As noted 
above, the identification module 304 receives the original media signal 108 and 
identifies individual scenes 206, as well as characteristics 502 of each scene 206. The 
characteristics 502 may include, for instance, motion characteristics, color 
characteristics, YUV signal characteristics, color grouping characteristics, color dithering 
characteristics, color shifting characteristics, lighting characteristics, and contrast 
characteristics. Those of skill in the art will recognize that a wide variety of other 
characteristics of a scene 206 may be identified within the scope of the invention. 
[0042] Motion is composed of vectors resulting from object detection. Relevant 
motion characteristics may include, for example, the number of objects, the size of the 
objects, the speed of the objects, and the direction of motion of the objects. 
[0043] With respect to color, each pixel typically has a range of values for red, green, 
blue, and intensity. Relevant color characteristics may include how the ranges of values 
change through the frame set, whether some colors occur more frequently than other 
colors (selection), whether some color groupings shift within the frame set, whether 
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differences between one grouping and another vary greatly across the frame set 
(contrast). 

[0044] In one embodiment, an artificial intelligence (Al) system 504, such as a neural 
network or expert system, receives the characteristics 502 of the scene 206, as well as 
a target data rate 506 for the compressed media signal 210. The Al system 504 then 
determines whether a codec 110 exists in the library 308 that has previously been found 
to optimally compress a scene 206 with the given characteristics 502 at the target data 
rate 506. As explained below, the Al system 504 may be conceptualized as "storing" 
associations between sets of characteristics 502 and optimal codecs 110. If an 
association is found, the selection module 306 outputs the codec 110 (or an indication 
thereof) as the "selected" codec 110. 

[0045] In many cases, a scene 206 having the specified characteristics 502 may not 
have been previously encountered. Accordingly, the selection module 306 makes a 
copy of the scene 206, referred to herein as a baseline snapshot 508, which serves as a 
reference point for determining compression quality. 

[0046] Thereafter, a compression module 510 tests different codecs 110 from the 
codec library 308 on the scene 206. In one embodiment, the compression module 510 
is also the compression module 310 of FIG. 3. As depicted, the compression module 
510 compresses the scene 206 using different codecs 1 10 at the target data rate 506 to 
produce multiple compressed test scenes 512. 

[0047] The codecs 110 may be tested sequentially, at random, or in other ways, and 
all of the codecs 1 10 in the library need not be tested. In one embodiment, input from 
the Al system 504 may assist with selecting a subset of the codecs 110 from the library 
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308 for testing. In some cases, a time limit may be imposed for codec testing in order to 
facilitate real-time compression. Thus, when the time limit is reached, no additional 
compressed test scenes 512 are generated. 

[0048] In one embodiment, a comparison module 514 compares the compression 
quality of each compressed test scene 512 with the baseline snapshot 508 according to 
a set of criteria 516. The criteria 516 may be based on a comparison of Peak Signal to 
Noise Ratios (PSNRs), which may be calculated, for anMxN frame, by: 



where f is the original frame and f is the uncompressed frame. 
Alternatively, Root Mean Square Error (RMSE), Signal to Noise Ratio (SNR), or other 
objective quality metrics may be used as known to those of skill in the art. 
[0049] In certain embodiments, a Just Noticeable Difference (JND) image quality 
metric calculation may be used. JND is a robust objective picture quality measurement 
method known to those skilled in the art. It includes three dimensions for evaluation of 
dynamic and complex motion sequences — spatial analysis, temporal analysis and full 
color analysis. By using a model of the human visual system in a picture differencing 
process, JND produces results that are independent of the compression process and 
resulting artifacts. 

[0050] In one embodiment, the comparison module 514 automatically selects the 
codec 110 used to generate the compressed scene 512 that has the highest 
compression quality when compared to the baseline snapshot 508 according to the set 



PSNR = 20 x log 



255 



Eq. 1 
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of criteria 516. That codec 1 10 (or an indication thereof) is then output by the selection 
module 306 as the selected codec 110. 

[0051] The comparison module 514 tells the Al system 504 which codec 110 was 
selected for the scene 206. This allows the Al system 504 to make an association 
between the identified characteristics 502 of the scene 206 and the selected codec 110. 
Thus, in the future, the Al system 504 may automatically select the codec 110 for a 
similar scene 206 without the need for retesting by the comparison module 514. 
[0052] Referring also to FIG. 3, in one configuration, the highest-quality compressed 
test scene 512a is simply passed to the output module 312 (not shown) to be included 
in the compressed media signal 210. However, the compression module 310 could 
recompress the scene 206 using the selected codec 1 10 in certain embodiments. 
[0053] FIG. 6 provides an example of the above-described process. Suppose that 
the identification module 304 finds a scene 206a having a particular set of 
characteristics 502a. In one embodiment, the Al system 504 searches an association 
602 between the characteristics 502a and a particular codec 110. While the Al system 
504 is depicted as including characteristics 502, associations 602, and codecs 110, 
those skilled in the art will recognize that these entities may be represented by codes, 
hashes, or other identifiers in various implementations. 

[0054] Assuming that no such association 602 is found, a baseline snapshot 508 of 
the scene 206a is taken. In addition, the compression module 510 compresses the 
scene 206a at the target data rate 506 using a number of different codecs 1 10a-c from 
the codec library 308 to create a plurality of compressed test scenes 512a-c. These 
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test scenes 512a-c are then compared against the baseline snapshot 508 according to 
a set of criteria 516, e.g., PSNR. 

[0055] Suppose that the compressed test scene 512a produced by one codec 1 10a 
("Codec 1") results in the highest compression quality, e.g., the highest PSNR. In such 
a case, the comparison module 514 would inform the Al system 504 so that an 
association 602 could be made between the characteristics 502a of the scene 206a and 
the selected codec 110a. Thus, if a scene 206 having the same characteristics 502a is 
encountered in the future, the Al system 504 could simply identify the optimal codec 
1 10a without the need for retesting. 

[0056] Referring to FIG. 7, the Al system 504 may be implemented using a typical 
feedforward neural network 700 comprising a plurality of artificial neurons 702. A 
neuron 702 receives a number of inputs (either from original data, or from the output of 
other neurons in the neural network 700). Each input comes via a connection that has a 
strength (or "weight"); these weights correspond to synaptic efficacy in a biological 
neuron. Each neuron 702 also has a single threshold value. The weighted sum of the 
inputs is formed, and the threshold subtracted, to compose the "activation" of the 
neuron 702 (also known as the post-synaptic potential, or PSP, of the neuron 702). The 
activation signal is passed through an activation function (also known as a transfer 
function) to produce the output of the neuron 702. 

[0057] As illustrated, a typical neural network 700 has neurons 702 arranged in a 
distinct layered topology. The "input" layer 704 is not composed of neurons 702, per se. 
These units simply serve to introduce the values of the input variables (i.e., the scene 
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characteristics 502). Neurons 702 in the hidden 706 and output 708 layers are each 
connected to all of the units in the preceding layer. 

[0058] When the network 700 is executed, the input variable values are placed in the 
input units, and then the hidden and output layer units are progressively executed. 
Each of them calculates its activation value by taking the weighted sum of the outputs of 
the units in the preceding layer, and subtracting the threshold. The activation value is 
passed through the activation function to produce the output of the neuron 702. When 
the entire neural network 700 has been executed, the outputs of the output layer 708 
act as the output of the entire network 700 (i.e., the selected codec 110). 
[0059]. While a feedforward neural network 700 is depicted in FIG. 7, those of skill in 
the art will recognize that other types of neural networks 700 may be used, such as 
feedback networks, Back-Propagated Delta Rule Networks (BP) and Radial Basis 
Function Networks (RBF). In other embodiments, an entirely different type of Al system 
504 may be used, such as an expert system. 

[0060] In still other embodiments, the Al system 504 may be replaced by lookup 
tables, databases, or other data structures that are capable of searching for a codec 
110 based on a specified set of characteristics 502. Thus, the invention should not be 
construed as requiring an Al system 504. 

[0061] Referring to FIG. 8, the invention is not limited to embodiments in which 
different codecs 1 10 are used to respectively encode different scenes 206 of an original 
media signal 108. As illustrated, a single codec 110 may be used in one embodiment. 
However, different settings 804 (parameters) for the codec 110 may be automatically 
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selected in much the same way that different codecs 110 were selected in the 
preceding embodiments. 

[0062] As used herein, codec settings 804 refer to standard parameters such as the 
motion estimation method, the GOP size (keyframe interval), types of transforms (e.g., 
DCT vs. wavelet), noise reduction for luminance or chrominance, decoder deblocking 
level, preprocessing/postprocessing filters (such as sharpening and denoising), etc. 
[0063] As before, suppose that the identification module 304 finds a scene 206a 
having a given set of characteristics 502a. In one embodiment, the Al system 504 
searches an association 802 between the characteristics 502a and one or more settings 
804a for the codec 110. 

[0064] Assume that no such association 802 is found. In one configuration, a 
baseline snapshot 508 of the scene 206a is taken. In addition, the compression module 
510 compresses the scene 206a at the target data rate 506 using the same codec 110 
but with different settings 804a-c. The resulting compressed test scenes 512a-c are 
then compared against the baseline snapshot 508 according to a set of criteria 516, 
e.g., PSNR. 

[0065] Suppose that the compressed test scene 512a produced by one group of 
settings 804a ("Settings 1") results in the highest compression quality, e.g., the highest 
PSNR. In such a case, the comparison module 514 would inform the Al system 504, so 
that an association 802 could be made between the characteristics 502a of the scene 
206a and the selected group of settings 804a. Accordingly, if a scene 206 having the 
same characteristics 502a is encountered in the future, the Al system 504 could simply 
identify the optimal settings 804a without the need for retesting. 
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[0066] In still other embodiments, the Al system 504 may search for both different 
codecs 110 and different codec settings 804 based on a given set of characteristics 
502. Likewise, the compression module 510 may generate compressed test scenes 
512 based on combinations of different codecs 110 and different settings 804. The 
comparison module 514 may then select the best combination of codec 110 and 
settings 804 for a given scene 206. 

[0067] In one embodiment, as shown in FIG. 9, the comparison module 514 may 
consider other factors in addition to (or in lieu of) compression quality in determining 
which codec 110 and/or settings 804 to automatically select for a particular scene 206. 
For instance, the use of certain codecs 110 may incur licensing costs 902 based on 
patents or other intellectual property rights. The licensing costs 902 may be tied to the 
number of times the codec 110 is used, the amount of data compressed using the 
codec 1 1 0, or in other ways. 

[0068] While one codec 110 may provide an exceptionally high compression quality 
(e.g., PSNR), its licensing cost 902 may exceed the value of the transmission and would 
not be cost justified. Indications of the licensing costs 902 for various codecs 1 10 may 
be stored within the codec library 308 or at other locations accessible by the 
comparison module 514. 

[0069] In one embodiment, the licensing costs 902 are considered only when a 
number of the top codecs 110 produce similar results, e.g., the compression qualities 
differ by no more than a threshold amount. In the example of FIG. 9, the first three 
codecs 110 produce output of similar quality. However, the codec 110 with the highest 
PSNR score is more than two times more expensive than the codec 110 with the next 
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highest PSNR score, which is, itself, almost three times more expensive than the codec 
110 with the third highest PSNR score. In one configuration, the comparison module 
510 would select the codec 110 with the third highest PSNR score due to its much lower 
licensing cost 902. 

[0070] In other embodiments, the comparison module 514 may create a composite 
score (not shown) based on the PSNR score, the licensing cost 902, and other possible 
factors. In still other embodiments, the comparison module 514 may calculate an 
anticipated cost (not shown) for the entire transmission and seek to minimize that cost 
over all of the codec selection decisions. Hence, the comparison module 514 might 
select a more expensive codec 110 for certain scenes 206, where a substantial 
increase in quality is realized, while selecting less expensive codecs 110 for other 
scenes. 

[0071] Referring to FIG. 10, a user of the source system 202 may specify a particular 
target data rate 506, e.g., 512 kbps, for video communication. However, there is no 
guarantee that the destination system 204 may be able to process data that quickly. 
Moreover, there is no guarantee that the network 114 will always provide the same 
amount of bandwidth. As a result, there may be a need to periodically change the 
target data rate 506 within the selection module 306 of the source system 202, since the 
target data rate 506 will affect which codecs 1 10 are selected for various scenes 206. 
[0072] For example, as shown in FIG. 10, the destination system 204 may be 
embodied as a video-enabled cellular telephone. Typically, the bandwidth over cellular 
networks 114 is limited. Similarly, the processing power of a cellular telephone is 
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substantially less than that of a personal computer or dedicated video conferencing 
system. 

[0073] Thus, although the user of the source system 202 specifies a target data rate 
506 of 512 kbps, the destination system 204 and/or network 114 may not be up to the 
challenge. In one embodiment, in response to receiving a connection request, the 
destination system 204 provides the source system 202 with a modified target data rate 
1002, e.g., 128 kpbs. The modified rate 1002 may be communicated to the source 
system 202 using any standard data structure or technique. Thereafter, depending on 
the configuration, the target data rate 506 may be replaced by the modified rate 1002. 
[0074] In certain embodiments, an actual data rate is not communicated. Rather, a 
message is sent specifying one or more constraints or capabilities of the destination 
system 204 or network 1 14, in which case it would be up to the source system 202 to 
revise the target data rate 506 as appropriate. A technique of altering the target data 
rate 506 in response to various conditions is referred to herein as "dynamic streaming." 
[0075] In one embodiment, dynamic streaming may be employed where no specific 
message is sent by destination system 204. The source system 202 may use latency 
calculations, requests to resend lost packets, etc., to dynamically determine the target 
data rate 506 for purposes of codec and/or parameter selection. 
[0076] In one configuration, as shown in FIG. 11, video frames 1102 within a scene 
206 may be subdivided into a plurality of sub-frames 1104. While the depicted video 
frame 1102 is subdivided into four sub-frames 1104a-d of equal size, the invention is 
not limited in this respect. For instance, a video frame 1 102 may be subdivided into any 
number of sub-frames 1104, although too many sub-frames 1104 may adversely affect 
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compression quality. Moreover, the sub-frames 1104 need not be of equal size. For 
example, sub-frames 1 104 near the center of the video frame 1 102 may be smaller due 
to the relatively greater amount of motion in this area. 

[0077] In certain embodiments, the sub-frames 1104 may be defined by objects 
represented within the video frame 1102. As an example, the head of a person could 
be defined as a separate object and, hence, a different sub-frame 104 from the 
background. Algorithms (e.g., MPEG-4) for objectifying a scene within a video frame 
1 102 are known in the art. 

[0078] A set of sub-frames 1 104a-d within a scene 206 exhibit characteristics 502a- 
d, and may be treated, for practical purposes, like a complete video frame 1102. 
Accordingly, using the techniques described above, the characteristics 502a-d may be 
used to determine an optimal codec 1104a-d for the compressing the respective sub- 
frames 1104a-d. For example, an Al system 504 (not shown) may be used to 
determine whether an association 602 exists between a set of characteristics 502 and a 
particular codec 110. If no association 602 exists, compression 510 and comparison 
514 modules (not shown) may be used to test a plurality of codecs 110 on the 
respective sub-frames 1 1 04 to determine the optimal codec 110. 
[0079] Thus, different sub-frames 1104a-d of a single scene 206 may be 
compressed using different codecs 1 10a-d. In the illustrated embodiment, four different 
codecs 1 10a-d are used. 

[0080] While specific embodiments and applications of the present invention have 
been illustrated and described, it is to be understood that the invention is not limited to 
the precise configuration and components disclosed herein. Various modifications, 
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changes, and variations apparent to those of skill in the art may be made in the 
arrangement, operation, and details of the methods and systems of the present 
invention disclosed herein without departing from the spirit and scope of the present 
invention. 

What is claimed is: 
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